Type Graphs in Practice 3 Our Analyser
نویسندگان
چکیده
Aliasing The basic type graph domain is augmented with de nite same-value information, in much the same way as done by Janssens and Bruynooghe [8]. This di ers from the pattern domain used for the same purpose in GAIA [2]. The di erence is unlikely to a ect our results in any signi cant way as the properties of the type graph operations does not depend on the aliasing information. 5 Type Graph Widenings To ensure termination of the analysis it is necessary that there are nitely many successive approximations of the concrete call and success patterns for a predicate. This property holds trivially for many simple domains, especially where the domains consist of a nite set of domain values. For type graphs it does not hold2 as it is possible to construct an in nite number of type graphs describing successively larger sets of concrete values. To ensure termination of the analysis for these kinds of domains a widening can be used [3] to ensure that any such sequence of successively less precise approximations is stationary, i.e., that the analyser will reach a nal approximation in a nite number of steps. For type graphs this translates to a method that generalises a type graph by introducing back arcs and/or any-nodes and in the process increasing the set of concrete terms the type graph describes. By introducing back arcs, what we call folding, the type described by a type graph becomes recursive. Typically the widening operation is applied when combining a new and an old call pattern for a predicate and when combining a new and an old success pattern for a predicate. Depth-k Restriction One way of ensuring termination is to limit the number of times a speci c functor can occur on a forward path from the root of the type graph3 [8, 7]. Assuming a nite number of distinct functors in the analysed program this implies a nite maximum size of a type graph and thus ensures termination. When a depth restriction is detected4 it must be resolved by introducing a back arc to an ancestor node with a greater or equal denotation. The existence of such an ancestor node is ensured by forming the upper bound of the selected ancestor node5 with the node that violated the depth restriction 2I.e., it does not obey the \ nite ascending chain" condition. 3A forward path is a path between two nodes that does not traverse any back arcs. 4This happens surprisingly rarely. When analysing the benchmark nand with Depth-2 less than ten percent of the depth violation tests succeed. 5Selecting a suitable ancestor node can be done in several ways. We currently select the nearest suitable node where a suitable node is either a functor-node with the same name/arity as the replaced node or an or-node with a child with the same name/arity as the removed node. For all the details see [7]. 4 and then repeating the process until no violation of the Depth-k restriction remains. We have evaluated the case where k = 2, i.e., when no functor is allowed to appear more than twice on a forward path. In the tables below this method is denoted Depth-2. Topological Clash In order to better guide the generalisation of the type graphs Hentenryck et al. used both the old type graph and the upper bound of the old and new type graph for a particular argument position [6]. Comparing the two type graphs they use the mismatching nodes, what they termed topological clashes, to guide the introduction of back arcs. When resolving a topological clash the clashing node is replaced by a back arc, either to a node with greater denotation than the replaced node, or, if no such node exist, an upper bound is formed of the clashing node and a suitable ancestor. To ensure termination this upper bound must decrease the size of the type graph. A simple method is to use an any-node but it is possible to obtain better results by actually performing an upper bound and ensuring the size limit in some way. It is not clear what method was used in [6]. Our implementation uses the ordinary upper bound operation and then applies successively aggressive methods to bring the size down to below the limit. As a last, but in practice never encountered, resort we fall back to using an any-node when the upper bound cannot otherwise be made small enough. In the tables below this method is denoted TC. Type Jungles Type jungles ensure niteness by requiring that a particular functor be represented by identical nodes wherever it appears in a type graph. Thus it can be seen as a further restriction upon Depth-1. Type jungles have the nice property that they allow upper bound of two (or more) type jungles to be computed using a particularly compact intermediate representation (basically a dictionary with one entry per functor). This means that for the cases where the analyser would normally compute an upper bound of two type graphs and then do a generalisation, it can instead use a specialised upper bound using type jungles to obtain the widened result directly. As used here, we still need to transform the type jungle from its compact representation to a type graph. In a companion paper we investigate an analyser using the compact type jungle representation throughout the analysis [9]. In the tables below this method is denoted Jungle. 6 Additional Heuristics We also tried some additional methods piggybacked upon the other widenings in an attempt to alleviate some of their problems. 5 Fold Equal Nodes The basic upper bound and intersection operations for type graphs do not produce a type graph that is minimal [8]. Thus there may be nodes with a denotation equal to an ancestor of the node. It is then possible to replace the node with a back arc to the ancestor, thus producing a type graph with the same denotation with fewer nodes. Unlike the other methods this method does not change the denotation of the a ected type graph. The fold-equal-nodes method is denoted with the use of the subscript EQ. Fold More Precise Nodes It is possible to obtain a smaller type graph with a larger denotation by folding nodes that have an ancestor that subsumes them. This is actually used as the preferred way to resolve topological clashes in widening TC, the method of Hentenryck et al.. The folding was applied immediately before the ordinary widening used. When it applies it can signi cantly reduce the size of a type graph, often creating the same recursive type graph as the analyser would have determined anyway, but using fewer iterations. Use of this method is denoted with the use of the subscript LEQ. 7 Evaluation The Benchmarks We have chosen our benchmarks from the Berkeley benchmark suite. This set of benchmarks are widely known, selected to be representative of real Prolog programs, easily obtainable6 and can be executed using accompanying input data. The Berkeley benchmarks have been used [5] to evaluate the analyser and compiler in the Aquarius system. We have included the \large" benchmarks from this archive. Additionally we used le aquarius compiler.pl 7, a stand-alone version of the compiler in the Aquarius system. We have not been able to obtain the benchmarks used by Hentenryck et al [6] in their evaluation of the TC widening8. Table 1 lists the benchmarks used and Table 2 gives some indication of the size and structure of each benchmark program. The size measures are; the number of procedures, the number of clauses, the number of goals, and the total number of arguments. The size measures does not include the procedures that are unreachable from the main entrypoint. They also do 6Available as . 7Available at . There have been a number of versions of this circulating under the name apc.pl. At least some of them are broken in ways that causes the analyser to determine that the program would fail before doing any useful work. 8The benchmarks available by ftp from Brown University do not seem to be the same as those used in [6] since several size measures di er from those reported for the benchmarks in [6]. Making a meaningful comparison of our results with those reported in [6] using \similar" programs would be di cult, especially for domains as precise as those considered here. 6 not include calls to unde ned procedures, such as builtins for which the analyser have no special handling. Synthetic Benchmarks We have also includes two very small programs that expose weaknesses in the widenings. tree4 (Figure 1) recognises unary trees with four di erent node-labels. expr (Figure 2) recognises arithmetic expressions. main :tree4( ). tree4(a). tree4(b(T)) :tree4(T). tree4(c(T)) :tree4(T). tree4(d(T)) :tree4(T). Figure 1: tree4.pl main :expr( ). Note: X = 'NUMBER'( ) is used as stand-in for number/1. expr(X) :X = 'NUMBER'( ). expr(+(X)) :expr(X). expr(-X) :expr(X). expr(X+Y) :expr(X), expr(Y). expr(X-Y) :expr(X), expr(Y). expr(X*Y) :expr(X), expr(Y). expr(X/Y) :expr(X), expr(Y). expr(X//Y) :expr(X), expr(Y). expr(X mod Y) :expr(X), expr(Y). expr(integer(X)) :expr(X). expr(float(X)) :expr(X). expr(X # Y) :expr(X), expr(Y). expr(X<>Y) :expr(X), expr(Y). expr([X]) :expr(X). Figure 2: expr.pl Measurements We measured the time taken for the analysis9; the number of goals analysed (Iter.), i.e., the number of times a new call-pattern caused a procedure to be analysed; the maximum size10 of any intermediate type graph, measured 9The time does not include the time to read and pre-process the analysed program. It also does not include the time for garbage-collection, typically this requires an additional 30%. 10The number of nodes and arcs. 7 Berkeley Benchmarks crypt.pl Solve a simple crypto-arithmetic puzzle. meta qsort.pl A meta-interpreter running qsort. prover.pl A simple theorem prover. browse.pl Build and query a database. unify.pl A compiler code generator for uni cation. flatten.pl Source transformation to remove disjunctions. sdda.pl A dataow analyser that represents aliasing. reducer.pl A graph reducer based on combinators. boyer.pl An extract from a Boyer-Moore theorem prover. simple analyzer.pl A dataow analyser analysing qsort. nand.pl A logic synthesis program based on heuristic search. chat parser.pl Parse a set of English sentences. Other Benchmarks aquarius compiler.pl Optimising Prolog compiler. tree4.pl Recognise unary trees with four node-types. expr.pl Recognise the basic arithmetic expressions Table 1: The benchmarks. For size measures see Table 2. on the inputs to (jinj) and result of (joutj) the upper bound operation; the maximum size of any type graph in the result (jResultj). We also measured the size of the result from upper bound after folding equal nodes (jmin(out)j). For some benchmarks the analysis timed out, using a two minute time limit on all type-graph operations. For these cases we show the intermediate sizes that appeared up to that point. The Results We show the results for the unmodi ed TC and Depth-2 as well as these methods enhanced with folding of equal and more precise nodes. We also show the result obtained by using Jungle, the widening based on type jungles. Unmodi ed TC and Depth-2 As can be seen in Table 3 and Table 4 the maximum intermediate type graph size is sometimes much larger than the sizes of the nal type graphs. Both Depth-2 and TC times out on some of the benchmarks even though TC behaves better. Of particular interest is their behaviour on tree4 and expr. Both these benchmarks were constructed to model programs that occur in practice but without introducing the precision loss that often helps the analysis terminate on larger programs. tree4 was modeled after meta qsort and gives the kind of success pattern obtained from a predicate recognising, e.g., the various constructs in a programming language. 8 Name #Procs #Clauses #Goals #Args Berkeley Benchmarks crypt.pl 9 27 29 18 meta qsort.pl 8 26 18 10 prover.pl 10 33 28 22 browse.pl 14 29 29 42 unify.pl 29 63 79 141 flatten.pl 28 58 55 83 sdda.pl 28 77 69 78 reducer.pl 30 95 83 95 boyer.pl 24 134 34 61 simple analyzer.pl 67 136 140 254 nand.pl 40 136 189 174 chat parser.pl 155 494 345 742 Other Benchmarks aquarius compiler.pl 1238 3813 4683 4018 tree4.pl 2 5 4 1 expr.pl 2 16 24 1 Table 2: Size of benchmark programs. expr is similar but recognises a more realistic sublanguage. A case similar to this occurred when we tried to analyse an explicit de nition of the builtin is/2. In contrast to Depth-2 the widening TC nicely handles tree4. Neither method succeeds in analysing expr, and when they time out they had reached intermediate type graph sizes in excess of two million, translating to at least half a million nodes. In contrast the type-graph that would eventually result from the analysis of expr would have approximately one node per clause, for a total size below sixty. A polyvariant analyser would likely be more susceptible to the problems exposed by these benchmarks as it would not lose precison as easily as a monovariant analyser such as ours. As can be seen the size of the programs are not a problem in itself. Both the basic Depth-2 and TC can successfully analyse at least some of the larger programs. Modi ed TC and Depth-2 Adding the heuristics of folding equal nodes after all operations and of folding nodes to less precise ancestors signi cantly reduces the size of the intermediate type graphs occurring for several benchmarks. TCEQ;LEQ (Table 5) succeeds in analysing all the benchmarks except chat parser. In particular TCEQ;LEQ quickly reaches the expected nal result for expr. Depth-2EQ;LEQ (Table 6) still fails to analyse expr. The analysis timed 9 TC Name Iter. jinj joutj jResultj Time Berkeley Benchmarks crypt 19 37 41 33 1.05 meta qsort 46 978 1282 279 21.64 prover 46 631 4733 100 6.88 browse 36 290 442 288 204.82 unify 61 127 83 61 2.860 flatten 51 39 60 38 1.63 sdda 61 83 83 59 2.83 reducer 52 137 142 137 2.64 boyer TO 41154 273311 N/A N/A simple analyzer 121 936 940 451 42.58 nand 132 1185 3997 303 39.91 chat parser TO 4602760 4617870 N/A N/A Other Benchmarks aquarius compiler 2734 6523 27070 369 212.14 tree4 2 39 317 12 0.15 expr TO 4251693 8503379 N/A N/A Table 3: TC with no additional heuristics. Depth-2 Name Iter. jinj joutj jResultj Time Berkeley Benchmarks crypt 14 21 25 12 0.89 meta qsort TO 5444 44348 N/A N/A prover TO 6885 84881 N/A N/A browse 25 82 90 48 1.55 unify 66 127 82 61 4.44 flatten 54 60 61 38 2.36 sdda 52 48 49 59 2.41 reducer 47 137 137 137 3.12 boyer 39 576 582 582 5.05 simple analyzer 130 534 584 94 7.99 nand 162 633 1533 116 51.13 Other Benchmarks tree4 TO 144
منابع مشابه
Polynomial Root Solving on the Electronic Differential Analyser (A Technique for Finding the Real and Complex Roots of a Polynomial using an Electronic Differential Analyser)
can be obtained. The technique is essentially a graphic one in which the function f(x) and its derivatives /' (x), /" (x), ■ ■ ■ are generated by an electronic differential analyser The method depends heavily on the ability of the analyser to produce an accurate plot of the function and its derivatives. The use of these accurate graphs is obvious for finding the real roots of the equation. But ...
متن کاملPolynomial Root Solving on the Electronic Differential Analyser (A Technique for Finding the Real and Complex Roots of a Polynomial using an Electronic Differential Analyser)
can be obtained. The technique is essentially a graphic one in which the function f(x) and its derivatives /' (x), /" (x), ■ ■ ■ are generated by an electronic differential analyser The method depends heavily on the ability of the analyser to produce an accurate plot of the function and its derivatives. The use of these accurate graphs is obvious for finding the real roots of the equation. But ...
متن کاملMetadata Analyser: Measuring Metadata Quality
Scientific research is increasingly dependent on publicly available information and data sharing. So far, the best practices to ensure that data is accessible and shareable has been to deposit it in public repositories. However, these repositories often fail to implement mechanisms that measure data quality, which could lead to improving the discoverability of existing data, and contribute to i...
متن کاملبررسی میزان آگاهی، نگرش و عملکرد بهورزان، مناطق منتخب جنوب شهر تهران و قم در زمینه پیشگیری از بیماری مالاریا
This research is a descriptive study which had been preformed to identity theknowledge , Attitude and practice of the health Workers of south area inTehran and Quom about prevention of malaria disease in 1995. the sampleswere consist of 50 health workers, who had been selected by a class samplingmethod from a targer population . The data collection instruments were a questionnair and a check l...
متن کاملToposław - A Lexicographic Framework for Multi-word Units
The paper presents a tool for the creation of an electronic dictionary of multi-word proper names. Toposław uses graphs for the representation of inflectional and pragmatic variants of names. It cooperates with Morfeusz, a morphological analyser and generator for Polish words, and Multiflex, a cross-language morpho-syntactic generator of multi-word units. Our goal was to create a userfriendly t...
متن کاملBest Proximity Point Result for New Type of Contractions in Metric Spaces with a Graph
In this paper, we introduce a new type of graph contraction using a special class of functions and give a best proximity point theorem for this contraction in complete metric spaces endowed with a graph under two different conditions. We then support our main theorem by a non-trivial example and give some consequences of best proximity point of it for usual graphs.
متن کامل